\chapter{Static Background Error Covariance}
\label{ch:background-error}

This chapter provides a comprehensive examination of static background error covariance implementation in the GSI system, covering the mathematical theory, computational methods, and practical applications of background error modeling in variational data assimilation.

\section{Theoretical Foundation}
\label{sec:background-theory}

\subsection{Background Error Covariance Definition}

The background error covariance matrix $\mathbf{B}$ represents one of the most critical components in variational data assimilation, fundamentally determining the analysis quality by controlling the impact ratio, spatial distribution, and inter-variable relationships of analysis increments.

Since most data assimilation backgrounds consist of model forecasts from a prior time step, the background error covariance matrix can be formally defined as:

\begin{equation}
\mathbf{B} = \mathbb{E}[(\mathbf{x}^f - \mathbf{x}^t)(\mathbf{x}^f - \mathbf{x}^t)^T]
\label{eq:background-error-def}
\end{equation}

where $\mathbf{x}^f$ represents the forecast (background) state, $\mathbf{x}^t$ denotes the true atmospheric state, and $\mathbb{E}[\cdot]$ indicates the expectation operator.

Since the true atmospheric state $\mathbf{x}^t$ is unknown in practice, forecast errors must be estimated using statistical methods. The two primary approaches are:

\begin{itemize}
\item \textbf{NMC Method}: Forecast errors are estimated using differences between forecasts of different lengths (typically 12 and 24 hours) valid for the same time:
\begin{equation}
\mathbf{B}_{NMC} \approx \mathbb{E}[(\mathbf{x}^{f,12h} - \mathbf{x}^{f,24h})(\mathbf{x}^{f,12h} - \mathbf{x}^{f,24h})^T]
\end{equation}

\item \textbf{Ensemble Method}: Forecast errors are estimated using ensemble perturbations:
\begin{equation}
\mathbf{B}_{ens} \approx \frac{1}{K-1} \sum_{k=1}^{K}(\mathbf{x}_k^f - \overline{\mathbf{x}}^f)(\mathbf{x}_k^f - \overline{\mathbf{x}}^f)^T
\end{equation}
where $K$ is the ensemble size and $\overline{\mathbf{x}}^f$ is the ensemble mean.
\end{itemize}

\subsection{Dimensionality Challenge}

The full background error covariance matrix $\mathbf{B}$ for realistic atmospheric models has dimensions on the order of $10^6 \times 10^6$ to $10^7 \times 10^7$, making direct storage and manipulation computationally infeasible. This dimensional challenge necessitates sophisticated decomposition and modeling techniques.

\subsection{Control Variable Transformation}

To address the dimensionality problem, GSI employs a carefully designed set of analysis control variables that minimize cross-correlations and facilitate efficient representation of the background error covariance structure. The standard control variables are:

\begin{align}
\psi &\quad \text{(streamfunction)} \\
\chi_{u} &\quad \text{(unbalanced velocity potential)} \\
T_{u} &\quad \text{(unbalanced virtual temperature)} \\
q &\quad \text{(specific humidity)} \\
p_{s,u} &\quad \text{(unbalanced surface pressure)} \\
c_w &\quad \text{(cloud water mixing ratio)}
\end{align}

This choice of variables exploits the natural balance relationships in atmospheric dynamics, significantly reducing off-diagonal terms in $\mathbf{B}$.

\section{Mathematical Decomposition}
\label{sec:mathematical-decomposition}

\subsection{Matrix Factorization}

GSI implements the background error covariance through a sophisticated factorization:

\begin{equation}
\mathbf{B} = \mathbf{B}_{balance} \mathbf{V} \mathbf{B}_Z (\mathbf{B}_x \mathbf{B}_y \mathbf{B}_y \mathbf{B}_x) \mathbf{B}_Z \mathbf{V} \mathbf{B}_{balance}^T
\label{eq:b-matrix-decomposition}
\end{equation}

where each factor represents a specific physical or mathematical operation:

\begin{itemize}
\item $\mathbf{B}_{balance}$: Balance relationships among different variables
\item $\mathbf{V}$: Square root of error variances
\item $\mathbf{B}_Z$: Vertical correlation structure
\item $\mathbf{B}_x, \mathbf{B}_y$: Horizontal correlation operators in zonal and meridional directions
\end{itemize}

\subsection{Balance Relationships}

The balance component $\mathbf{B}_{balance}$ implements statistical balance relationships derived from regression analysis:

\begin{align}
\chi &= \chi_u + \alpha_{\psi \rightarrow \chi}(\phi, \sigma) \psi \\
T_v &= T_{v,u} + \alpha_{\psi \rightarrow T_v}(\phi, \sigma) \psi \\
p_s &= p_{s,u} + \alpha_{\psi \rightarrow p_s}(\phi) \psi
\end{align}

where $\phi$ denotes latitude and $\sigma$ represents the model vertical coordinate. The regression coefficients $\alpha$ are computed offline using large datasets spanning typically one month or more.

\subsubsection{Balance Coefficient Arrays}

The regression coefficients are stored in the following arrays within GSI:

\begin{itemize}
\item \textbf{agvi}(0:mlat+1,1:nsig,1:nsig): Stream function to temperature regression coefficients
\item \textbf{wgvi}(0:mlat+1,1:nsig): Stream function to surface pressure regression coefficients  
\item \textbf{bvi}(0:mlat+1,1:nsig): Stream function to velocity potential regression coefficients
\end{itemize}

where \texttt{mlat} is the number of latitudes in the original background error domain and \texttt{nsig} is the number of vertical levels.

\subsection{Variance Structure}

The variance component $\mathbf{V}$ represents the square root of background error variances for each control variable:

\begin{equation}
\mathbf{V} = \text{diag}(\sigma_{\psi}, \sigma_{\chi_u}, \sigma_{T_{v,u}}, \sigma_q, \sigma_{p_{s,u}}, \sigma_{c_w})
\end{equation}

These variances exhibit strong dependence on:
\begin{itemize}
\item Geographic location (latitude/longitude)
\item Vertical level (pressure/sigma coordinate)
\item Season and climate regime
\item Model resolution and physics
\end{itemize}

\subsubsection{Variance Arrays}

GSI stores variance information in:
\begin{itemize}
\item \textbf{corz}(1:mlat,1:nsig,1:nc3d): Square root of variance for 3D variables
\item \textbf{corp}(1:mlat,nc2d): Square root of variance for 2D variables
\end{itemize}

\subsection{Correlation Length Scales}

The correlation structure is characterized by horizontal and vertical length scales that determine the spatial influence of observations on the analysis.

\subsubsection{Horizontal Length Scales}

Horizontal correlation length scales vary with:
\begin{itemize}
\item Geographic location (shorter over land, longer over oceans)
\item Variable type (longer for mass fields, shorter for wind fields)
\item Vertical level (generally increasing with height)
\item Topographic complexity
\end{itemize}

The arrays storing horizontal length scale information include:
\begin{itemize}
\item \textbf{hwll}(0:mlat+1,1:nsig,1:nc3d): Horizontal length scales for 3D variables
\item \textbf{hwllp}(0:mlat+1,nc2d): Horizontal length scales for 2D variables
\end{itemize}

\subsubsection{Vertical Length Scales}

Vertical correlation scales reflect the layered structure of the atmosphere:
\begin{itemize}
\item \textbf{vz}(1:nsig,0:mlat+1,1:nc3d): Vertical length scales for all variables
\end{itemize}

These scales typically show:
\begin{itemize}
\item Smaller values in the boundary layer due to strong vertical gradients
\item Larger values in the free troposphere where stratification is weaker
\item Variable-dependent characteristics reflecting different physical processes
\end{itemize}

\section{Background Error File Processing}
\label{sec:file-processing}

\subsection{Standard Background Error Files}

GSI distribution includes several pre-computed background error files for different applications:

\begin{itemize}
\item \textbf{nam\_nmmstat\_na.gcv}: Regional background error statistics computed using NAM model forecasts covering North America. Contains 93 latitude lines from -2.5° to 89.5° with 60 vertical sigma levels from 0.9975289 to 0.01364.

\item \textbf{nam\_glb\_berror.f77.gcv}: Global background error statistics based on GFS model. Covers 192 latitude lines from -90° to 90° with 42 vertical sigma levels from 0.99597 to 0.013831.

\item \textbf{new\_rtma\_regional\_nmm\_berror.f77.gcv}: Background error matrix specifically designed for Real-Time Mesoscale Analysis (RTMA) applications.
\end{itemize}

\subsubsection{File Format and Endianness}

These files are stored in binary format with both Big Endian and Little Endian versions available:
\begin{itemize}
\item Default: Big Endian binary files (compatible with PGI and Intel compilers)
\item Alternative: Little Endian versions (for gfortran and other compilers)
\end{itemize}

\subsection{Regional vs. Global Error Statistics}

\subsubsection{Global Applications}

For global applications, all background error parameters exhibit latitude dependence:
\begin{itemize}
\item Regression coefficients vary with latitude to account for planetary-scale dynamics
\item Variance and length scale parameters adapt to meridional atmospheric structure
\item Spectral representation facilitates efficient computation on global grids
\end{itemize}

\subsubsection{Regional Applications}

Regional background error statistics show selective latitude dependence:
\begin{itemize}
\item \textbf{Latitude-dependent parameters}:
  \begin{itemize}
  \item Velocity potential regression coefficients
  \item Variance parameters for all control variables
  \item Horizontal length scales for all control variables
  \end{itemize}
\item \textbf{Latitude-independent parameters}:
  \begin{itemize}
  \item Surface pressure regression coefficients
  \item Temperature regression coefficients
  \item Vertical length scales for all variables
  \end{itemize}
\end{itemize}

\subsection{Vertical Interpolation}

Background error statistics are read at their original sigma levels and interpolated vertically using logarithmic sigma coordinates:

\begin{equation}
\ln(\sigma_{analysis}) = f(\ln(\sigma_{original}))
\end{equation}

This interpolation is performed by specialized subroutines:
\begin{itemize}
\item \textbf{berror\_read\_wgt}, \textbf{berror\_read\_wgt\_reg}: Interpolate variance and length scale information
\item \textbf{berror\_read\_bal}, \textbf{berror\_read\_bal\_reg}: Interpolate balance coefficients
\end{itemize}

\section{Horizontal Interpolation and Grid Adaptation}
\label{sec:horizontal-interpolation}

\subsection{Interpolation Strategy}

GSI includes built-in mechanisms to interpolate background error statistics from their native grid to any desired analysis grid. This capability enables:
\begin{itemize}
\item Use of pre-computed statistics on different grid configurations
\item Adaptation to varying model resolutions
\item Consistent error modeling across different domains
\end{itemize}

\subsection{Balance Coefficient Interpolation}

Horizontal interpolation of regression coefficients is handled by:
\begin{itemize}
\item \textbf{prebal}: Global applications (balmod.f90)
\item \textbf{prebal\_reg}: Regional applications (balmod.f90)
\end{itemize}

Interpolated coefficients are stored in:
\begin{itemize}
\item \textbf{Global}: bvz, agvz, wgvz arrays
\item \textbf{Regional}: bvk, agvk, wgvk arrays
\end{itemize}

\subsection{Variance and Length Scale Processing}

The subroutines \textbf{prewgt\_reg} and \textbf{prewgt} perform:
\begin{itemize}
\item Horizontal interpolation of variance and length scale parameters
\item Application of tuning parameters from gsiparm.anl and anavinfo.txt:
  \begin{itemize}
  \item \textbf{vs}: Variance scaling factors
  \item \textbf{hzscl}: Horizontal length scale scaling factors
  \item \textbf{as3d}: 3D variable amplitude scaling
  \item \textbf{as2d}: 2D variable amplitude scaling
  \end{itemize}
\end{itemize}

Processed information is transformed into analysis-ready arrays:
\begin{itemize}
\item \textbf{slw}, \textbf{sli}: Length scale parameters for recursive filters
\item \textbf{dssv}: 3D variance array with dimensions (lat, lon, nsig, variables)
\item \textbf{dssvs}: 2D variance array with dimensions (lat, lon, variables)
\end{itemize}

\section{Background Error Covariance Application}
\label{sec:covariance-application}

\subsection{Operational Implementation}

The background error covariance is applied through the fundamental relationship:

\begin{equation}
\nabla_y J = \mathbf{B} \nabla_x J
\end{equation}

implemented by subroutine \texttt{bkerror(gradx, grady)}. This operation transforms gradients from analysis space to control space, serving as the essential preconditioning step in the conjugate gradient minimization.

\subsection{Three-Step Application Process}

\subsubsection{Step 1: Adjoint Balance Transformation}

The adjoint of the balance equation $\mathbf{B}_{balance}^T$ is applied by calling \texttt{tbalance}:

\begin{align}
\nabla_{\psi} J &= \nabla_{\psi} J + \alpha_{\psi \rightarrow \chi}^T \nabla_{\chi_u} J + \alpha_{\psi \rightarrow T_v}^T \nabla_{T_{v,u}} J + \alpha_{\psi \rightarrow p_s}^T \nabla_{p_{s,u}} J \\
\nabla_{\chi_u} J &= \nabla_{\chi_u} J \\
\nabla_{T_{v,u}} J &= \nabla_{T_{v,u}} J \\
\nabla_{p_{s,u}} J &= \nabla_{p_{s,u}} J
\end{align}

\subsubsection{Step 2: Correlation and Variance Application}

The subroutine \texttt{bkgcov} implements the core correlation and variance operations:

\begin{enumerate}
\item \textbf{Variance Multiplication}: Apply $\mathbf{V}$ via \texttt{bkgvar}
\item \textbf{Vertical Smoothing}: Apply $\mathbf{B}_Z$ via \texttt{frfhvo}
\item \textbf{Domain Transformation}: Convert from subdomain to full field via \texttt{general\_sub2grid}
\item \textbf{Horizontal Smoothing}: Apply $\mathbf{B}_x \mathbf{B}_y \mathbf{B}_y \mathbf{B}_x$ via \texttt{smoothrf}
\item \textbf{Domain Back-Transformation}: Return to subdomain via \texttt{general\_grid2sub}
\item \textbf{Vertical Smoothing}: Second application of $\mathbf{B}_Z$ via \texttt{frfhvo}
\item \textbf{Variance Multiplication}: Second application of $\mathbf{V}$ via \texttt{bkgvar}
\end{enumerate}

\subsubsection{Step 3: Forward Balance Transformation}

The balance equation $\mathbf{B}_{balance}$ is applied by calling \texttt{balance}:

\begin{align}
\chi &= \chi_u + \alpha_{\psi \rightarrow \chi} \psi \\
T_v &= T_{v,u} + \alpha_{\psi \rightarrow T_v} \psi \\
p_s &= p_{s,u} + \alpha_{\psi \rightarrow p_s} \psi
\end{align}

\subsection{Recursive Filtering Implementation}

\subsubsection{Horizontal Smoothing}

The horizontal smoothing operation \texttt{smoothrf} implements anisotropic recursive filters that can adapt to:
\begin{itemize}
\item Terrain-following coordinate systems
\item Variable grid spacing
\item Boundary conditions
\item Multiple length scales with weighted combinations
\end{itemize}

The smoothing operation at each vertical level involves calling \texttt{raxyyx} with:
\begin{itemize}
\item \textbf{hswgt}: Horizontal scale weighting factors
\item \textbf{hzscl}: Horizontal length scale tuning factors
\end{itemize}

\subsubsection{Vertical Smoothing}

Vertical correlation is applied through \texttt{frfhvo}, which implements:
\begin{itemize}
\item Recursive filtering in the vertical direction
\item Proper treatment of boundary conditions at surface and model top
\item Scale-dependent filtering based on vertical length scale parameters
\end{itemize}

\section{Tuning and Calibration}
\label{sec:tuning}

\subsection{Tuning Parameters}

GSI provides extensive tuning capabilities through namelist parameters:

\begin{itemize}
\item \textbf{vs}(1:nsig): Vertical scaling factors for variance
\item \textbf{hzscl}(1:nsig): Horizontal length scale scaling factors  
\item \textbf{as3d}(1:nc3d): Amplitude scaling for 3D variables
\item \textbf{as2d}(1:nc2d): Amplitude scaling for 2D variables
\end{itemize}

These parameters allow operational centers to adapt the background error statistics to:
\begin{itemize}
\item Local climate characteristics
\item Model-specific biases
\item Observation network density
\item Seasonal variations
\end{itemize}

\subsection{Adaptive Background Error}

\subsubsection{Hybrid Ensemble-Variational}

Modern GSI implementations incorporate ensemble-derived background error covariance:

\begin{equation}
\mathbf{B}_{hybrid} = (1-\beta) \mathbf{B}_{static} + \beta \mathbf{B}_{ensemble}
\end{equation}

where $\beta$ controls the relative weight between static and ensemble-derived components.

\subsubsection{Flow-Dependent Aspects}

Ensemble-based components introduce flow-dependent error characteristics:
\begin{itemize}
\item Anisotropic correlation structures following weather patterns
\item Enhanced cross-variable correlations during active weather
\item Adaptive length scales reflecting current atmospheric conditions
\item Time-varying balance relationships
\end{itemize}

\section{Regional Applications and RTMA}

\subsection{RTMA-Specific Considerations}

The Real-Time Mesoscale Analysis (RTMA) employs specialized background error characteristics:

\begin{itemize}
\item \textbf{Anisotropic Recursive Filters}: The namelist variable \texttt{anisotropic} under \texttt{\&ANBKGERR} must be set to \texttt{.true.}
\item \textbf{Terrain Mapping}: Background error covariances are mapped to the underlying terrain field
\item \textbf{Random Perturbations}: The file \texttt{random\_flips} contains random numbers needed for anisotropic error generation
\item \textbf{Analysis Error Estimation}: Files \texttt{bckgvar\_*} contain square-root background error covariances for analysis error evaluation
\end{itemize}

\subsection{RTMA Tuning Parameters}

Specific RTMA tuning includes:
\begin{itemize}
\item \textbf{afact0=1}: Activates anisotropic component (use afact0=0 for isotropic analysis)
\item \textbf{lsmoothterrain=.true.}: Induces terrain field smoothing before covariance computation
\end{itemize}

\section{Quality Control and Diagnostics}

\subsection{Background Error Diagnostics}

GSI provides comprehensive diagnostics for background error assessment:

\begin{itemize}
\item Variance field visualization and validation
\item Length scale distribution analysis  
\item Balance relationship verification
\item Spectral characteristics of error covariance
\end{itemize}

\subsection{Innovation Statistics}

Background error quality can be assessed through innovation statistics:

\begin{equation}
\sigma_{innovation}^2 = \sigma_{background}^2 + \sigma_{observation}^2
\end{equation}

Consistent innovation statistics indicate:
\begin{itemize}
\item Appropriate background error magnitude
\item Correct specification of observation errors
\item Unbiased background forecasts
\end{itemize}

\section{Advanced Topics}

\subsection{Multivariate Background Error}

GSI supports multivariate background error modeling for:
\begin{itemize}
\item Chemical data assimilation (aerosols, trace gases)
\item Hydrological applications (soil moisture, snow)
\item Land surface analysis (vegetation, surface fluxes)
\end{itemize}

\subsection{Scale-Dependent Background Error}

Modern developments include:
\begin{itemize}
\item Spectral decomposition of background error
\item Scale-selective hybrid weighting
\item Resolution-dependent error modeling
\item Wavelet-based error representation
\end{itemize}

\section{Summary}

The static background error covariance system in GSI represents a sophisticated implementation of atmospheric error modeling principles. The mathematical framework combines statistical rigor with computational efficiency, while the modular design facilitates adaptation to diverse applications. The comprehensive tuning capabilities enable optimization for specific model configurations and observation networks, making GSI adaptable to a wide range of operational and research requirements.

Key achievements of the GSI background error system include:
\begin{itemize}
\item Efficient handling of high-dimensional covariance matrices through factorization
\item Flexible interpolation capabilities enabling grid-independent error statistics
\item Comprehensive balance relationship modeling capturing atmospheric dynamics
\item Robust recursive filtering implementation for anisotropic correlation structures
\item Extensive diagnostic and tuning capabilities supporting operational optimization
\end{itemize}

The continued evolution toward hybrid ensemble-variational systems builds upon this solid foundation, incorporating flow-dependent error characteristics while maintaining the computational efficiency and mathematical rigor of the static framework.